Skip to content

Add tfidf#14406

Closed
yacinemebarki wants to merge 2 commits intoTheAlgorithms:masterfrom
yacinemebarki:add-tfidf
Closed

Add tfidf#14406
yacinemebarki wants to merge 2 commits intoTheAlgorithms:masterfrom
yacinemebarki:add-tfidf

Conversation

@yacinemebarki
Copy link

@yacinemebarki yacinemebarki commented Mar 15, 2026

Describe your change:

Added a pure Python implementation of TF-IDF vectorizer under machine_learning/feature_extraction/.
The vectorizer includes:

  • Vocabulary extraction from a list of documents
  • Term Frequency (TF) computation
  • Inverse Document Frequency (IDF) computation
  • TF-IDF matrix generation
  • Encoding method for new documents
  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change doctests?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation. (https://en.wikipedia.org/wiki/Tf%E2%80%93idf)
  • If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".

@algorithms-keeper algorithms-keeper bot added require tests Tests [doctest/unittest/pytest] are required require type hints https://docs.python.org/3/library/typing.html labels Mar 15, 2026
Copy link

@algorithms-keeper algorithms-keeper bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

# to seprate words and normlize it


def decompose(text):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/feature_extraction/tf-idf.py, please provide doctest for the function decompose

Please provide return type hint for the function: decompose. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: text


# creating tfidf class
class TfIdfVectorizer:
def __init__(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide return type hint for the function: __init__. If the function does not return a value, please provide the type hint as: def function() -> None:

self.idf = None

# these method to compute the tf for each word in given data
def compute_tf(self, data):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/feature_extraction/tf-idf.py, please provide doctest for the function compute_tf

Please provide return type hint for the function: compute_tf. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: data


return np.array(tfidf, dtype=float)

def encode(self, data):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/feature_extraction/tf-idf.py, please provide doctest for the function encode

Please provide return type hint for the function: encode. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: data

@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed require tests Tests [doctest/unittest/pytest] are required require type hints https://docs.python.org/3/library/typing.html

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant